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• • • INIRCDUCTICN 

• ’ • DATA RETRIEVAL 

• 1 ‘ NATURAL LANGUAGE DATABASE FBCNT ENDS 

• • 1 GENERIC DESIGN OBJECTIVES 

• • • AN INIBODUCTICN TO KARL 

• • ’ GENERIC OBJECTIVES REVISED 

• • * DESIGN 1VEIHBOLOGY 

• * • SPECIFIC NLQS DESIGN OBJECTIVES 

• • * KARL NL PROCESSING CAPABILITIES 

• • * OVERVIEW OF THE QUERY PROCESSING CYCLE 
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* * * DATA STRUCTURES 

* * * LEXICAL AND GRAMMAR ANALYSIS 

* * * SYNTACTIC ANALYSIS 

* * * SEMANTIC ANALYSIS 

* * * FQEMAL QUEKY GENERATION 

AND EVALUATION 

* * * SYSTEM INTERCONNECTIONS 

* * * ANNOTATED EXAMPLES 

* * * EVALUATION OF OBJECTIVES 

* * * FUNCTIONAL EVALUATION 

* * * CURRENT STATUS AND FUTURE W3RK 

* * * CONCLUSIONS 
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* * * APPLICABILITY OF COMPUTERS IN AIMDST 

EVERY HCM^N ACTIVITY 

* * * M3RE APPLICATIONS ARE DEVELOPING 

* * * MDRE NON EXPERTS NEED ACCESS TO COMPUTERS 

* * * LACK OF CCMPOTER LITERACY AIVCNG 

MNY CURRENT CASUAL USERS 

* * ‘ MDST USERS EXPECT COMPUTERS WILL BE 

THE "SOLUTION TO ALL PROBLEMS' 

* * * FREQUENT USER DISSATISFACTION RESULTS 

* * * DEFINITE NEED FOR IMPROVED HLM&N- SYSTEM 

CCNMJNI CATIONS PROCEDURES 
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DATA 


• * * THE INPCEMATICN AGE IS A REALITY 

• • • WIDE VARIETY OF AVAILABLE 

TECHNOLOGIES AND SYSTEMS 

• * * EARLY DATA RETRIEVAL TECHNIQUES 

' ’ ’ SIMPLE FILE-BASED SYSTEMS 

* ’ * LARGE APPLICATION PROCRANE 

* * * LACK OF MODERN CAPABILITIES 

(I.E. , SHARING, INTEGRITY) 

• • • FILE MANAGEMENT SYSTEMS 

* * * IMPROVED PERPCSMANCE 

* * • SOVE CAPABILITIES PCR 

* * • SHARING 

* * * SECURITY 

* • • INTEGRITY 

* * ’ STILL, PROGRAMMING WVS NECESSARY 
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DATA 


* * * DATA BASE MANAGEMENT SYSTEMS 

* * * SUPERIOR TO FILE MGMT. SYSTEMS 

* * * DIFFERENT TYPES 

* * * RELATIONAL 

* * * HIERARCHICAL 

* * * NEIWCRK 

* * * PROVIDE LANGUAGES FOR: 

* * * DATA DEFINITICN/CRGANIZATICN 

* * * DATA MANIPULATICN/RETRIEVAL 

* * * CAPABILITIES FOR 

* ‘ * SECURITY 

* * * DATA INDEPENDENCE 

* * * DATA REORGANIZATION 

* * * SHARING 
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DATA 




rtlEVAL (OC N T D) 


ACCESSING A DATABASE 

* * * INTERACT I VELY 

* * * BATCH M3DE 

* * * IBRD APPLICATION PE 


iff! 


SAVE 


INTERACTIVE MODE MDST FREQUENT WITH 
CASUAL USERS 

* * * NO NEED FOR PROGRAVMING 

* * * M3RE CONVENIENT 

* * * STILL REQUIRES FX3EMAL TRAINING 

THERE IS A NEED FOR MDRE EFFICIENT 
RETRIEVAL LANGUAGES 


USER-CRIENTED LANGUAGES MDST APPEALING 





* * • NATURAL LANGUAGE DATABASE QUERY SYSTEMS 

* * * NCN-EROCEDURAL LANGUAGES 

* * • NO FCEMAL SYNTAX CR SEMANTICS 

(SYSTEM LIMITATIONS MAY EXIST) 

* * • REDUCED QUERY SIZES 

’ ' * CONSIDERING CASUAL USERS : 

* * ‘ MANY USERS LACK TIME CR DESIRE 

FOR FX3EMAL TRAINING 

* * * USERS LACK SYSTEM KNOWLEDGE 

* * * SYSTEM LACKS USER KNOWLEDGE 

* ' • RESULTS IN KNOWLEDGE GAP 
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FRONT ENDS 


NL 

• • • RATIONALE FOR NATURAL LANGUAGE 

DATABASE QUERY SYSTEMS : 

INCREASED USER EFFICIENCY THROUGH 
IMPROVED CCMWLNICATICNS BETWEEN 
USER AND SYSTEM 

• ’ * NL QUERIES SIMPLER THAN ANY OTHER 

RETRIEVAL ALTERNATIVE 
(PCfMAL QUERIES, PROGRAMS, ETC.) 

• • * EXAMPLE: 

FCFMAL QUERY: 

RANGE OF E IS EMPLOYEE 
SELECT (SALARY, NAME) 

WHERE (SALARY > 18,000 & SEX = "MALE") 
PRINT E 

NL QUERY 

PLEASE PRINT THE NAMS AND SALARIES 
OF ALL 1VEN THAT EARN M3?E THAN $18,000 

ECEMAL VERSUS NATURAL QUERY 
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NL DATABASE J 


aEgci 


n 


ENDS (OCNTDj. 


* * * MAJOR ADVANTAGES 

* * * INCREASED HUMAN PRODUCTIVITY 

* * * INCREASED SYSTEM PRODUCTIVITY 

(FEHFR ERRORS AND RE-TRIES) 

* * * REDUCED USER FRUSTRATION 

* * * VIRTUAL ELIMINATION OF A 

TRAINING PERIOD 

* * * CUSTOMIZED CAPABILITIES CAN 

BE PROVIDED 

* * * IMPROVED HANDLING OF NATURAL ' 

LANGUAGE CONCEPTS 
(THESAURUS, SYNONYMS, ETC) 

* * * POSSIBLE INTEGRATION INTO A TOTAL 

NL FRONT END ENVIRONMENT 
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NL DATABASE 




C ENDS (CE N T D) 


PROBLEVB WITH NL IMPLEMENTATIONS 
ON EXISTING SYSTEVB 


* * ‘ LONG DEVELOEVIENT TIMES 

* * * RESTRICTED APPLICATION DOMAINS 

* * * POOR PORTABILITY BETWEEN 

OPERATING SYSTEVB /TOOLS 

* * * SOVE SYSTEMS DO NOT SUPPORT 

PRODUCTION LEVEL DHVE S 

* * * EXTENSIVE RESOURCE UTILIZATION 

STILL, EXISTING NLQS S ARE IN HIGH 
DEMAND BY USERS AT ALL LEVELS 


MV NY PRODUCTION M 


t;ija 


IS AVAILABLE 


* * • ADAPTABILITY TO NEW APPLICATIONS 

• * * SYSTEM MJST BE USABLE WITH 

NO CODE MODIFICATIONS 

* * 1 PORTABILITY BETWEEN DATABASE 

SYSTEMS AND OPERATING SYSTEMS 

* • • REDUCED CCMTEXITY 

• ‘ * M3DULAR, INDEPENDENT DESIGN 

• * * SIMPLE INFLEMENTATICN 

* * • EFFICIENCY 

• * ’ OPTIMIZED DESIGN 

• * ‘ OPTIMIZED INFLE\£NTATICN 
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m i] 


gmcMif* 


[ION IQ KARL 


* * * KARL IS A: 
KNOWLEDGE 
ASSISTED 
RETRIEVAL 
LANGUAGE 


* * * RESTRICTED NATURAL LANGUAGE 
DATABASE QUERY SYSTEM 


* * * KNOWLEDGE- ASSISTED 

(OTHER TECHNIQUES ALSO USED) 

* * * EXPERIMENTAL VEHICLE 

FOR THE DESIGN AND IMPLEMENTATION 
OF NATURAL LANGUAGE QUERY SYSTEMS 
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GENERIC DESIGN 


• ' ‘ ADAPTABILITY 

* * • KNOWLEDGE BASE CAN BE REDEFINED 

TO USE WITH NEW APPLICATIONS 

’ * * LANGUAGE -RELATED KNOWLEDGE 

(LANGUAGE RULES ARE TYPICALLY 
INDEPENDENT OF APPLICATION) 

• ’ ' PORTABILITY 

• * * KARL IS EVPLENENTED USING: 

C PROGRAMMING LANGUAGE 
*** UNIX 4.2 OPERATING SYSTEM 
• * * INGRES V7 DHVB 
' • * NO SYSTEM-DEPENDENT CALLS 
' * * GENERAL EMBEDDED QUERY STRUCTURE 


13 



GENERIC TlBBTfiN OBJECTIVES REVISED (OCNT'D) 


* * * 


REDUCED COMPLEXITY 


OdVMDK PFKXRANMING LANGUAGE USED 


HIGHLY M3DULAR 


■IK 


IGN 


* * * 


PRECISE MDDULE INTERFACES 


* * * 


SINGLE-FUNCTION COMPONENTS 


• * • EFFICIENCY 

• * • NO DYNAMIC IVEVLEY ALLOCATION 

• • 1 SIMPLE, EFFICIENT ALGORITHMS 

• * * USE OF A COMPILED LANGUAGE 

• * • REDUCED SUBROUTINE CALLS 

• * • FURTHER OPTIMIZATION POSSIBLE 
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DESIGN MTHDDOLOGY 


* * * DI VIDE- AND- OCNQUER APPROACH 


* * 1 DIVIDES TASK OF NL PROCESSING 
INTO A SEQUENCE OF SUB-PROBLEMS 


* ’ * DEFINES PRECISE INTEGRATION 


• • ‘ SOLVES INDIVIDUAL PROBLEMS 

* * * INTEGRATES INTO FUNCTIONAL SYSTEM 


• • * FUNCTIONAL DEOCMPOSITICN 

• * * EACH MDDULE PEEPCfMS A SINGLE TASK 

• * * MDDULE SIZE DEPENDS ON FUNCTION 

• ' * USES SOFTWARE TOOLS WHERE POSSIBLE 
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Eli MEIHODOLOGT ( Q C N T’D) 


ITacj 


• • * TOP-DOWN INTEGRATION IS USED 

• • * CONVENIENCE OF UPDATES/ IMPROVEMENTS 

• • • EFFICIENT DESIGN 

• ’ • ERRORS ISOLATED IN SINGLE MODULES 

* • ’ INTEGRATION PROCEDURE 

• * * CCRfcLN QUEKT REPRESENTATION 

AMGNG DIFFERENT M3DULES 

• • ’ EACH NEEDLE IS VIEWED AS A "BLACK BCK" 

• * * SEQUENTIAL PROCESSING ORGANIZATION 

• * * PROVISION IS MADE FOR ERROR SIGNALS 
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SPECIFIC NLQS OBJECTIVES 


KNCWIEDGE STORAGE, RETRIEVAL, ACQUISITION 
AND UTILIZATION CAPABILITIES 


GRAMMATICAL AND LEXICAL CONSTRUCTS 
HANDLING CAPABILITIES 


SYNTACTIC HANDLING CAPABILITIES 

SEMANTIC HANDLING CAPABILITIES 

ELLIPTIC QUERY' HANDLING AND 
GENERAL ERROR REPORTING CAPABILITIES 


KARL NL PROCESSI NG CAPABILITIES 

‘ * * KNOWLEDGE CAPABILITIES 

* * * KNOWLEDGE ACQUISITION 

* * * AT DEVELOPMENT TIME 

* * * AT SETUP TIME 

* * * DURING ACTUAL USE 

* * * KNOWLEDGE REPRESENTATION 

* * * FRAME-BASED DYNAMIC KNOWLEDGE 

* * * RULE-BASED STATIC KNOWLEDGE 

* * * KNOWLEDGE UTILIZATION 

* * * IN ALL ASPECTS OF QUERY PROCESSING 

* * * EVBEDDED IN MDDULES 
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* * * GRAMMAR/ LEXICAL ANALYSIS CAPABILITIES 

* * * DETEIMINES W3RD TYPES 

* * * FERPCRVS QUERY CLEAN-UP 

* * * GENERATES DATA STRUCTURES 

* * * SYNTACTIC VERIFICATION CAPABILITIES 

* * * OPERATES CN SINGLE DATA STRUCTURE 

* * * A VARIATION OF NERACRK GRAMMARS 

IS USED (RECURSIVE TRANSITION GRAMMARS) 

* * * DIFFERENT KIN FAMILIES HANDLED 

* * * APPLICATION- INDEPENDENT 

PROCEDURE IS USED 

* * * CAPABLE OF RESOLVING AMBIGUITIES 
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KARL NL 


asvtaias 


SING CAPABILITIES 


• * * SEMANTIC VERIFICATION 

• ' • LINGUISTIC SEMANTICS 

* ’ ’ NOUN/VERB PHRASES 

* ’ ’ ADJECTIVE HANDLING 

* • • ELLIPSIS /AM3IGUITY HANDLING 
“ * EB VERIFICATION 

* ’ * QUERY SEMANTICS 

* * * INTEGRITY CONSTRAINTS 

• • * LEARNING CAPABILITIES 

• * * UPDATE APPLICATION KNOWLEDGE 

• * * PROVIDE CUSTOMIZED PROCESSING 

• * 1 ELLIPSIS AND ANBIGUITY CAPABILITIES 

• * • MISSING TEEMS 

• • * USER CAN SUPPLY MISSING PARTS 
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OF THE QUERY 


CYCLE 


• * • LEXICAL/CRANMVR ANALYSIS 

* * • IDENTIFY TCKENS /TYPES 

* • • REPLACE SYNCNYNE/HEVDVE NDISHACRDS 

* * * GENERATE DATA STRUCTURES 

• ’ * SYNTACTIC ANALYSIS AND VERIFICATION 

• ‘ * SUHYHT TOKEN TYPE LIST TO VERIFIER 

• * * RECEIVE PATTERN FAMILY IDENTIFIER 

CR ERROR CODE (IF ERROR) 

• * - USE PATTERN IDENTIFIER FOR FURTHER 

QUERY PROCESSING 
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SING CYCLE .< CONI’ EQ. 


SEMANTIC VERIFICATION 

• * ‘ VERIFY LINGUISTIC SEMANTIC CORRECTNESS 
• 1 ’ VERIFY DATABASE SEMANTIC CORRECTNESS 

• ‘ ' RESOLVE ANEIGUITIES /ELLIPSES 

FCFMAL QUERY GENERATION 

• * ‘ TRANSFORM TOKEN AND IDENTIFIER LISTS 

INK) GENERIC QUERY FORMAT 

• * * GENERATE HOST DHVE QUERY 

FORMAL QUERY EVALUATION 

• • • OPEN DATABASE 

• ’ * EXECUTE QUERY 


* * * 


CLOSE DATABASE 


m OV E RV IEW QE THE QUERY ] 




SING CYCLE (QCNT'D) 


Input 

Query 

\/ 


I LEXICAL I 

I ANALYSIS I 

+ ++ + 

I I 
\/ 


+ ■ 


+ 


I SYNTAX I 

I VERIFIER I 

+ ++ + 

I I 
\/ 

+ + 

I VERIFIER I 

+ ++ + 



FCHJAL 
TERY 

ALUATICN 


I QUERY 
I EVALI1 



Syntax Knowledge 
Schema Knowledge 


Semantic Knowledge 
Schema Knowledge 


Schema Knowledge 
Formal Syntax Knowledge 
Formal Semantic Knowledge 


Formal Syntax Knowledge 
Formal Semantic Knowledge 
EH® Specific Knowledge 


THE NL QUERY PROCESSING CYCLE 
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DATA STRUCTURES 


* * * NL QUERY: LINKED LISTS 

* * * TOKEN LIST 

* ‘ * TCKEN IDENTIFIER LIST 


+ + + 

I NO. I token I 

+ + + 


V 


-i + ■ 

I NO. I 
+ + • 


token 


+ 

I 

+ 


V 


H 1 h 

I NO. I type I 

+ +J-----+ 

V 

+ 1 - 1 - 

I NO. I type I 
+ + - ----- + 


V 


• « • • 


STRUCTURE OF NL QUERY STORAGE AREA 
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DATA STRUCTURES 


* * * SAMPLE TCKENS AND TCKEN IDENTIFIERS: 


PCfMAL QUERY PQEMAL QUERY TCKEN PATTERN 

(with no noicewords) 


print 

V 

all 

ir 

students 

ir 

taking 
ir & 

CMPS351" 

ir 

and 

ir 

1 iving 

in 

ir 

Lafayette 


print 

student 

ir 

enrol 1 
ir 

CMPS351 

ir 


1 ive 

Lafayette 


Verb 

▼ 

Noun 

ir 

Verb 

V 

Li teral 

V 

Boolean 

Verb 

▼ 

Li teral 


* * * LINKED LIST BASED IMPLEMENTATION 
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DATA STRUCTURES (OCNT'D) 


DICTIONARY 

CONTAINS LIST OF ALL KNOW W3RDS AND TYPES 


NOUN TABLE 

CONTAINS LIST OF ALL KNOWN NOUNS, EITHER 
RELATION NAMES OR ATTRIBUTES 


SYNCNYNS TABLE 

CONTAINS SYNONYMS AND EQUIVALENT TEEMS 


VERBS TABLE 
CONTAINS V 


c 


AND RELATED NOUNS 


ADJECTIVES TABLE 

CONTAINS ADJECTIVES AND ASSOCIATED 
PROPERTIES ASSIGNED TO NOUNS 

MULTIPLE SEQUENCE PATTERNS TABLE 
CONTAINS NOUN SEQUENCES MAPPED TO 
SINGLE NOUNS IN THE KNOWLEDGE BASE 



DATA STRUCTURES (OCNTD) 


Noun Frame 


4 1 1 1 1 

I Name I Type I Data type I Max I Min 
+ + ---- + --- + + 


H 1 1* 

I Pattern lUni t I 

+ + + 


Synonyms Representation 

H 4 h 

I term I stands for I 


Verbs Representation 

1 1 

verb I subject I object 

+ “ + 


T + 

I 

- + 


Adjective Representation 


| Adjective | Noun | Impl iecLproper ty | 


Dictionary Representation 

+ + + 

I W>rd I TOjrd^type I 


Multiword Representation 

4 4 4 4" 

I Term I Patterned I Rank I 


DYNAMIC KNOWLEDGE REPRESENTATION SCHEMA 
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LEXICAL AND GRAhMAR ANALYSIS 


* * LEXICAL ANALYSIS 

• * • IDENTIFY TCKENS 

• ' * ATTACH TCKEN IDENTIFIERS 

’ ’ • GRAMMAR TRANSPCEMATICNS MAY BE NEEDED 

• ’ ‘ REPLACE SYNCNYVB/KEM3VE NOISETUUDS 


H K 

I Read NL Query I 

+ 'L + 

I 

-I- v + 

I Replace Multiple 1 
I Sequence Patterns I 

+ + 

I 

+ v + 

I Generate Tokens I 

+ + 

I 

+ v + 

I Replace Synonyms I 
I ana Noicewords I 
+ + 


LEXICAL ANALYSIS OF INPOT NL QUEKT 
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LEXICAL AND GRAtvMAR ANALYSIS (OCNT D) 


* * * 


GRAMMAR ANALYSIS 

* * * IF W3RD IS KNOW, THEN PROCEED 

* * * USE RULES TO DETERMINE WFD TYPE 

* * * QUERY USER IF UhKNOAN 

* * * RULES ENCODED AS "C" FUNCTIONS 


token 


r 

v 


r 


r 

v 


"i i 

V V 

Is Word in \ Y 
Dictionary ? ^ — 

UN 

V 

Is Counter at \ 


Elnd Of Rules Yet ? 




T 


./ 


N 


+ V 

| Apply Next Rule 


+ 

I 

+ 


+ + + 

I Get next ! 

> I Token I 

+ + 

H h 

>| Query User U 


GRAMMAR ANALYSIS OF INPUT QUERY 
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SY NT ACTIC ANALYSIS 


* * * VERIFIES CORRECTNESS OF NL QUERY BASED ON 

SYNTACTIC CRITERIA 

* * * MEANING OF ENTITIES NOT CONSIDERED 

* * * NEJRACRK- BASED GRAMMAR 

* * * TOKEN TYPES CURRENTLY SUPPORTED: 

*** NOUNS (N) 

*** ADJECTIVES (A) 

* * * BOOLEAN OPERATORS (B) 

* * * RELATIONAL OPERATORS (R) 

*** SYNCNYVB (S) 

*** VERBS (V) 

*** LITERALS (L) 
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*** TCKEN SEQUENCES (PATTERNS) 

* * * VERIFY RELATIVE] POSITION OF TOKENS 

* * * DIFFERENT PATTERN FAMILIES REPRESENTED 

* * * EXAMPLE: 

V (NB?)+ (VLB?)+ print names of students 

that live in "Dallas" 

V (NB?)+ (NR+LB?) + print names of faculty 

with salary of more than 24,000 

V (AN)+ print the good students 

V (VLB?) who is working in "Dallas"? 

("who" is replaced by 
retrieve name " ) 

( a ) repetitions of construct "a" 

a+ one or more occurences of construct "a" 

a? construct "a" is optional 

a* -zero or more occurences of construct "a" 

SAMPLE PATTERNS AND QUERIES 
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SYNTACTIC ANALYSIS (PO N T Dj 


* * * 


IMPLEMENTS KIN VERIFIER USING A 
FINITE STATE AUTOMATON REGULAR 
EXPRESSION RECOGNIZER 


* * * 


ACCEPT /REJECT STATES ONLY 


• * * 11 PATTERN FAMILIES SUPPORTED 

• * * IF NO PATTERN MATCHES, QUERY IS REJECTED 


* * ‘ FINITE STATE AUTOMATON IMPLEMENTED 
THROUGH THE "LEX" LEXICAL ANALYZER 
GENERATOR SOFTWARE TOOL 


"LEX" ACCEPTS FINITE STATE AUTOMATA 
SPECIFICATIONS AND GENERATES SOURCE 
CORE FOR REGULAR EXPRESSION VERIFIERS 
BASED CN THE SPECIFICATIONS 
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SYNTACTIC ANALYSIS (OCNT'D) 


* * * 


LEX DESCRIPTION FOR SAMPLE R 


NIZER: 


[A-Za-zl [A-Za-zO-9_] * 


[A-Za-z 1 
-? [0-9T+ 
-? 0-9\. 
" + -*/% 


] + 


{ return 
return 
return 
return 


I S_VARI ABLE ) 
IS INIEGEm ' 
ISZFLQATING 
IS OPERATOR 


LEX CONSTRUCTS 


A-Z matches single character uppercase 

a-z matches single character lowercase 

0-9 matches single digit 

[ . . . ] groups sub-patterns 

any character 

* zero or more times repetition 

+ one or more times repetition 

$ indicates end of line 

? optional element 


SAMPLE LEX F 




NIZER AND LEX OCNSIHUCTS 
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SEMANTIC ANALYSIS 


* * * LINGUISTIC ANALYSIS 

* * * NOUN PHRASE VERIFICATION 

* * * VERB FHRASE VERIFICATION 

* * * AM3IGUITY RESOLVING 

* * * ELLIPSIS/PLEIHCRA HANDLING 

* * * PROCESS FLCW DIAGRAM: 


Token 

Flow 


+ 

I 

+ v + 

I Ellipsis I 

+ + 


"’""""T 

+ v 

I Plethora 

+ 

I 

I 

I 

V 


+ 

I 

+ 


+ 

I 

+ v + 

I Ambiguity I 

+ 


+ 

i 

+ 

+ 


I Verb Phrase Proc 

+ + 

I 

+ v 

I Noun Phrase Proc . I 

+ + 

I 

v 
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* * * DB VERIFICATION 

* * * LITERAL RANGES 

* * * LITERAL PATTERNS 

* * * OPERATORS 

* * * OTHER INTEGRITY CONSTRAINTS 

* * * IS-A MATCHES (RELATIONSHIP MEMBERSHIR) 

* * * PROCESS FLOV DIAGRAM: 
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SEMANTIC ANALYSIS (CCNT'D-l 


*•* IMPLEMENTED THROUGH C" FUNCTIONS 

• • ‘ USES DYNAMIC KNOWLEDGE 

* 1 ‘ BOTH RULE- AND FRAME- BASED 

• • * SAMPLE RULES: 


IF TCKEN(N) IS ADJECTIVE 
THEN TOKEN (N + 1 ) MOST BE NOUN AND 
NOUN AND ADJECTIVE MIST AGREE 
AND HAVE ENTFTT IN THE KB-ADJ . 
ELSE ERROR = NO-NCfUN-ADJ -AGREEMENT. 


IF 

THEN 


ELSE 


TCKEN(N) IS VERB 
TCKEN(N-K) , TCKEN(N+K) ARE NOUNS 
AND MIST AGREE WITH THE DEFINI- 
TION OF THE VERB IN THE KB- VERB. 
R = ND-VERB-NOUN- AGREEMENT . 




IF TCKEN(N) IS LITERAL 
THEN TCKEN(N-K) IS THE NOCN ENTITY 
SO VERIFY THAT LITERAL RANGE 
IS ACCEPTABLE 

ELSE ERROR = LIT-OOT-OF -RANGE. 
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POEMAL QUERY - GENERATION AND EVALUATION 


RELATIVELY SEVFLE TASK AS NL QUERY - 
IS BEING "PQEMALIZED" THROUGHOUT THE 
PROCESS CYCLE 


DETEHVUNES DOMAINS /RANGES OF ATIRI BOTES 
DETETMINE TYPE OF OPERATION REQUESTED 
(COUNT, EXIST, RETRIEVE, ETC.) 

SELECT ATTRIBUTES TO BE RETRIEVED 


STRUCTURE THE CONDITIONALS LIST TO 
OCNPCEM WITH " SELECT - FRCM-WHERE 
GENERIC QUERY - FCXMYT 


CREATE GENERIC " SELECT -ERCM-VHERI 
QUERY AND DISPLAY IT TO THE USER 


EQEMLL QUERY GENERATIO N AND EVALUATION (CCNT’D) 


VERIFY GENERIC QUERY FOR OCRRECTINESS 
( I .E. , BOOLEAN OPERATORS MAY BE MISSING) 

GENERATE HOST BBVB-SFECIFIC FORMAL QUERY’ 

EXECUTE HOST DBMS-SPECIFIC QUERY’ 

DISPLAY RESULTS TO TEE USER 


GENERIC AND IN 


tili DC 


QUERY’ FORMATS: 


Blank" Format: 

SELECT <attribute_l ist > 

FROM <domain> 

WilkE < conditional ist > 


QUEL Format : 

RANGE OF <abbrev_name> IS < domain > 
RETRIEVE <dot_attr_l ist > 

WTERE <dot_condi t ional_l ist > 

(dot is the attribute domain prefix indicator) 
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SYS] 


MMMWfrKMJgigiaai 


ncMs 


*** INTERNAL MODULE OCNNECTIOSIS : 


+ 


+ 


Knowl edge 
Base Mbmt 
Sys tenr 


+ 

I 

+ 

I 

+ 


Input NL Query 

+ v + 

I Lexical and I 

I Graranar Analysis | 

’ r’"”” + 

+ v + 

I Syntax Analysis I 
I and Verification ! 

+ + 


+ v + 

i Semantic Analysis I 
I and Verification I 
+ + 


I Error 
+ Handl - 


+ v + 

I Query Generation I 

+ + 

I 

+ v + 

I Query Evaluation I 

+ + 
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SYSTEM I] 


gllSttM.-gigiiKM 


riQNS (0C N T1D) 


* * * EXTERNAL SYSTEM OOSNEXTICNS 


t ’mix' i 

i i 


+ + 

i 
i 

+ 

+ + ++ + 


The KARL System 


1 ow 
level 


I 

I 


high 

level 


+ + ++ + 

I I 

! INGRES Relational DHVB I 


+ -- + ■ 


+ 

I 

+ • 


-| + 

I System I 
I Calls I . 
+ + + 


+ 


: : data : : 

: : data : : 

: : base : : 

: : base : : 


I 

xxxxxxxxxxxxx 

X Knowledge X 
X Base X 

xxxxxxxxxxxxx 


<A11 Data Paths Bi-Directional) 
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CCJJ 


PATED 



QUERY 1: 

show the students enrolled in CMPS351" or "CMPS360 

LEXICAL ANALYSIS: show student enroll 

or CMPS360 

(ellipsis): show student enroll "CMPS351" or 

enroll CMPS360 

PATTERN: Verb (Noun Bool?) (Verb Literal Bool?)* 

SYNTACTIC ANALYSIS: Pattern Accepted, Pattern_No = 8 

SEMANTIC ANALYSIS: enroll (student, course) 

course PATTERN = XXXX9999" 

course Number = 360 < 699 


BLANK QUERY : 


course Number = 351 < 699 
SELECT all / * default * / 
ERCM student 

VHEEE (course = CMPS351" I 
course = CMPS360 ) 


QUERY PROCESSED CORRECTLY 


41 



A 


ecu 


rATED 


(OC NT 'D) 


QUERY 2: 

who is "000-4076-65" 

LEXICAL ANALYSIS: retrieve name "000-4076-65” 

(severe ellipsis): retrieve name "000-4076-65" i 

I 

PATTERN: Verb ( Noun Rel_op? Literal Bool? )+ I 

SYNTACTIC ANALYSIS: Pattern Accepted, PatternJJo = 4 

SEMANTIC ANALYSIS: Pattern 999-9999-999" matches ssn 
KEPCEM3: show student ssn ”000-4076-65" 
ssn PATTERN = "999-9999-99" 

BLANK QUERY: SELECT name 

FRCM student 

WERE ( ssn = "000-4076-65") 

QUERY PROCESSED CORRECTLY 
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CAEED EXAMPLES _t£ENT-lD-J- 


QDERY 3: 


print names and addresses of all the rich faculty 


LEXICAL ANALYSIS: print name address rich faculty 


PATTERN: Verb (Noun Bool?)+ ( Adjective Noun ) + 


SYNTACTIC ANALYSIS: Pattern Accepted, PatteriOJo = 12 


SEMANTIC ANALYSIS: name belongs to faculty 

address belongs to faculty 
rich := salary > 40 , 000 

EEFCEM5: print name address faculty 
salary > 40,000 
salary range acceptable 


BLANK QUERY 

SELECT name, address 
ERCM faculty 
W1ERE salary > 40000 

QUERY ACCEPTED 
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CAEED 


(PCNT D) 


QUERY 4: 


show students who live and work in "Lafayette" 

LEXICAL ANALYSIS: show student live & work "Lafayette 

PATTERN MATCHED: NONE (although sentence is correct) 

SYNTACTIC ANALYSIS: Failed. Program could not parse 

input sentence (No double verb 
pattern supported) 

QUERY REJECTED 
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QUERY 5: 

show the rich students 


LEXICAL ANALYSIS: show rich student 

PATTERN MATCHED: Verb ( Noun Relop Literal Bool?)+ 
(severe ellipsis, matches after replacing "rich") 

SYNTACTIC ANALYSIS: Pattern valid. Pattern No:_4 

SEMANTIC ANALYSIS: rich student: error. 

Attribute "salary" not associated 
with relation "student" 

QUERY REJECTED 
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CAXEO EXAMPTES (OQNT'RX 


QDERY 6: 

show the students enrolled in "CMPS999" 

LEXICAL ANALYSIS: show student enroll CMPS999" 
PATTERN MATCHED: Verb ( Verb Literal Bool? ) + 
SYNTACTIC ANALYSIS: Pattern valid. Pattern No:-ll 


SEMANTIC ANALYSIS: enroll (student, class) CK 

class pattern CK 

class number out of range 

class number > 699 


QUERY REJECTED 


EVALUATION QE OBJECT IV ES 


DETEIMINE IF GENERIC AND SPECIFIC OBJECTIVES 
HAVE BEEN MET WITH THE PROPOSED DESIGN 


GENERIC OBJECTIVES: 

* * * ADAPTABILITY 

* * * PORTABILITY 

* * * REDUCED CCMPLEXITY 

* * * EFFICIENCY 


GENERIC OBJECTIVES HAVE BEEN MET 
THROUGH METHODOLOGY PRESENTED 


* * * 



SPECIFIC DESIGN OBJECTIVES 

KNOWLEDGE STORAGE, RETRIEVAL, ACQUISITION 
AND UTILIZATION CAPABILITIES 

• ’ • GRAMMATICAL AND LEXICAL CONSTRUCTS 
HANDLING CAPABILITIES 

• * • SYNTACTIC HANDLING CAPABILITIES 

• • • SEMANTIC HANDLING CAPABILITIES 7 

• * ' ELLIPTIC QUERY HANDLING AND 
GENERAL ERROR REPORTING CAPABILITIES 

* * ‘ SPECIFIC DESIGN OBJECTIVES HAVE ALSO BEEN MET 
THROUGH FOLLOWING THE GUIDELINES SET BY THE 
GENERIC DESIGN CRITERIA AND THE DESIGN 
METHODOLOGY PRESENTED 

• ' ‘ KARL 1.00 CAPABLE OF PROCESSING 60-65% OF QUERIE 
SUHVCTTED (ADJUSTED FOR TYPING AND SPELLING 
ERRORS) . 
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FUNCTIONAL EVALUATION 


CRITERION KARL 

1. Be able to access multiple databases Y 

(i.e., retargetable within applications) 

2. Answer questions asked directly ( i . e . , Wio ) Y 

3. Handle multiple files and relationships Y 

4. Handle simple pronoun references N a 

5. Be able to handle ellipsis Y 

6. Provide report generating facilities for the N 

retrieved data (i.e., formats, graphs, etc) 

7. Be able to extend the linguistic knowledge. Y 

of the system during program execution 

8. Handle null cases, indicating N b 

the condition(s) that failed 

9. Restate in English the user’s query Y c 

10. Handle spelling and typing errors N 

11. Provide special functions for improvement N b 

the database capabilities 

12. Provide semantic constraints in the dialogue Y 

between the human and the machine, and handle 
errors such as plethora and ambiguity 

(a) Item has been considered as future extension 

(b) Item not in the original design considerations 

(c) The program restates the semi - formal ly 
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CURRENT STATUS AND FUTURE WORK 


CURRENT LIMITATIONS 


NESTED QUERIES 


SPELLING CORRECTION 


NULL QUERY" HANDLING 


PRONOUN REFERENCES 


* * * DYNAMIC KNOWLEDGE BASE STATUS: 

255 TOTAL KNOWN WORDS 
8 VERBS 
7 ADJECTIVES 
20 FRAMES 

27 MULTIPLE SEQUENCES 
24 NOUNS 
45 SYNONYMS 

* * * CURRENT APPLICATION: UNIVERSITY' DATABASE 
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CURRENT STATUS AND FUTURE WHK (PONT D) 


•“ FUTURE RESEARCH TOPICS: 

* * • NESTED QUERY HANDLING 

* * * PRONOUN REFERENCES 

* * ‘ SPELLING CORRECTION 

* • * NULL QUERY HANDLING 

* ’ * INTERFACE WITH OTHER SYSTEMS 

( I .E. , OCMEN COMMAND LANGUAGE IS&R 
FRONT END, OFFICE AUTOMATION SYSTEMS, 
OR OTHERS) 

• 1 • QUERY OPTIMIZATION 
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CONCLUSIONS 


* * * SIGNIFICANCE OF THE THESIS: 


* * * AN ALTERNATE 
ms INTRODUCED 


■ 3C 


IGN METHODOLOGY FOR NLQS 


* * * DESIGN OCNSIDERATICNS AND METHODOLOGY 

APPLICABLE TO OTHER NL PROCESSING AREAS 

* * * A FOUNDATION FOR FURTHER RESEARCH 7 

AND DEVELOPMENT \AAS PRESENTED 

* * * FURTHER RESEARCH TOPICS WiRE IDENTIFIED 

* * * SOLUTIONS VJEEE PROPOSED FOR SUCH TOPICS 

USING CURRENT PROTOTYPE AS A FOUNDATION 
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* * * NO NEED TO EMULATE CR SIMULATE NATURE 

* * * AN INVENTING RATHER THAN AN IMITATING 

APPROACH IS NEEDED 

* * * FUNCTIONAL EQUIVALENCE CAN OBTAIN SIMILAR 

RESULTS WITH SDVOLATI(^/EMDLATICN, 

USING CCNVENTICNAL TOOLS AND TECHNIQUES 

* * * FUNCTICNAL DECCMPOSITICN CAN ASSIST IN 

REDUCING CCMPLEX PROBLEMS INTO WORKABLE 
SIZE PROBLEMS 

* * * TECHNIQUES EXIST FOR SOLVING SMALLER SIZE 

PROBLEM (CCMPILER METHODS, SOFTWARE TOOLS, 
ARTIFICIAL INTELLIGENCE, ETC.) 
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COLLUSIONS ( PONT Dj 


* * * A NLQS CAN PROVIDE THE FOUNDATION FOB OTHER 

NL- BASED SOFTWARE SYSTEVB 

* * * DEFINED FUNCTIONALITY OF EACH OCMPCNENT WILL 

BE REQUIRED WITH NO INTERDEPENDENCIES 

* * * INTEGRATION TECHNIQUES WILL HAVE TO BE 

DEVELOPED TO MERGE ALL NL- BASED COMPONENTS 
INTO AN INTEGRATED ENVIRONMENT 7 

* * * THEN, THE HUMAN COMPUTER PROBLEM CAN BE 

ADCRESSED AND SOLUTIONS PRESENTED 
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